Jointly estimating epidemiological dynamics of Covid-19 from case and wastewater data in Aotearoa New Zealand

Background Timely and informed public health responses to infectious diseases such as COVID-19 necessitate reliable information about infection dynamics. The case ascertainment rate (CAR), the proportion of infections that are reported as cases, is typically much less than one and varies with testing practices and behaviours, making reported cases unreliable as the sole source of data. The concentration of viral RNA in wastewater samples provides an alternate measure of infection prevalence that is not affected by clinical testing, healthcare-seeking behaviour or access to care. Methods We construct a state-space model with observed data of levels of SARS-CoV-2 in wastewater and reported case incidence and estimate the hidden states of the effective reproduction number, R, and CAR using sequential Monte Carlo methods. Results We analyse data from 1 January 2022 to 31 March 2023 from Aotearoa New Zealand. Our model estimates that R peaks at 2.76 (95% CrI 2.20, 3.83) around 18 February 2022 and the CAR peaks around 12 March 2022. We calculate that New Zealand’s second Omicron wave in July 2022 is similar in size to the first, despite fewer reported cases. We estimate that the CAR in the BA.5 Omicron wave in July 2022 is approximately 50% lower than in the BA.1/BA.2 Omicron wave in March 2022. Conclusions Estimating R, CAR, and cumulative number of infections provides useful information for planning public health responses and understanding the state of immunity in the population. This model is a useful disease surveillance tool, improving situational awareness of infectious disease dynamics in real-time.


Fixed-Lag Bootstrap Filter
We have hidden states CAR t , R t , I t , observed data W t , C t , and fixed parameter vector θ.We are interested in learning the joint posterior distribution P (CAR 1:T , R 1:T , I 1:T , θ|W 1:T , C 1:T ).The goal of algorithm 1 (below) is to construct a set of particles CAR k , I : i = 1, ..., N that approximate this distribution.
The decomposition makes two assumptions: (1) y t is conditionally independent of y 1:t−1 given X t , and (2) X t is conditionally independent of y 1:t−1 given X 1:t−1 .The definition of the observation and state-space transition distributions require further assumptions that are clear from their definition.
We define our importance sampling distribution π in a similar recursive fashion: π(X 1:t |y 1:t ) = π(X t |X 1:t−1 , y 1:t )π(X 1:t−1 |y 1:t−1 ) requiring the non-restrictive assumption that our importance distribution on X 1:t−1 does not depend on y t .This further allows us to define the importance sampling weights recursively: Finally, we choose our importance sampling distribution π(X , allowing these terms to be cancelled, and giving the recursion for the particle weights: At each time step, we re-sample with replacement from our current particles according to w (i) k−1 , thus the selected particles at time step t can be viewed as direct draws from P θ (X t |y 1:t ).
Additionally re-sampling the h most recent time steps re-weights past particles according to P θ (y t |X (i) 1:t ), thus they can be viewed as samples from P θ (X t |y 1:t+L ).We note that in doing this resampling, we break the particle ancestry, so only have samples from the marginal distributions P θ (X t |y 1:t+L ) ≈ P θ (X t |y 1:T ) and not the full joint distribution over state-trajectories P θ (X 1:t |y 1:t+L ) (except for the final L states).
Conditional on a fixed value of θ, algorithm 1 presents our fixed-lag bootstrap filter.This assumes a default wind-in period equal to h (30-days), although this can be changed as necessary.

Practical Considerations for the Bootstrap Filter
Choosing the Fixed-Lag h In an ideal world we would resample the entire state-history at every step, producing direct samples from our desired posterior distribution P (X t |y 1:T ).In fact, this also admits samples from the joint posterior distribution across all time steps P (X 1:T |y 1:T ), which would allow us to sample individual trajectories.However, as T increases, the number of individual unique particles that remain in the earlier time steps (most obviously at t = 1) gets increasingly small.Eventually only a handful, or even a single, unique particle will remain for X 1 , which provides a very poor approximation of P (X 1 |y 1:T ).This can be somewhat overcome by increasing the number of particles N x , but given present computing power this quickly becomes impractical.Instead we re-sample only the most recent h time steps, which means our particles at time step t are samples from P (X t |y 1:t+h ) whenever t + h < T .The value of h needs to be large enough that P (X t |y 1:t+h ) ≈ P (X t |y 1:T ) while being small enough that we avoid particle degeneracy for reasonable values of N x .

Calculating Cumulative Infections
In Figure 4 of the main text we present estimates of cumulative infections CI t .If we were resampling entire particle histories at each time step (rather than fixed-lag resampling) we would be able to set CI = t u=1 I (i) u for our i th sample of cumulative infections at time t.However, our fixed-lag resampling breaks the state-histories, so this approach is invalid.Instead, we augment cumulative infections as an additional hidden state, at each time step setting: and resampling as usual.As the particle filter produces samples from P (CI t |y 1:t+h ), this method produces valid estimates of the pointwise cumulative infections CI t .

Wind-In Period
In practice we wind-in using two steps: (1) hidden states are randomly allocated values for t = −h to t = 0, and (2) the filter is run for t = 1 to t = k for some k.We present results and calculate likelihoods for t ≥ k + 1.The first step is necessary for there to be sufficient state-history to calculate the expected values of I t , C t , and W t , which all involve convolutions Algorithm 1 Fixed-lag bootstrap filter Input: Parameter vector θ, data W 1:T and C 1:T Sample CAR (i) −h:0 , and −h:0 from some initial state distribution for t = 1, ..., T do Sample CAR (i) t−L:t , I of past infections.However, only a few of these randomly allocated trajectories will be plausible, leading to considerable uncertainty in the initial estimates of our hidden states.Thus we use data to filter particles as described above in the period 0 ≤ t ≤ k and start the estimation window at t = k, so that all particle chains have plausible past trajectories at this time.In general k should be chosen to be greater than h.
This second wind-in period means that the estimation window only begins k days after the start of the period for which data are available.It is possible to run the algorithm without the second wind-in period, which may be necessary when data are limited.However, this leads to greater uncertainty about estimated states in the early part of the time period and introduces substantial additional variation in the estimates of the model log-likelihood (section 1.2.3).In practice, we used k = 50, except for model runs starting on 1 January 2022 where we use k = 31 as the earliest available data were 1 December 2021.

Likelihood Estimation
Thus far we have focused on estimating the value of the hidden states given some known parameter vector θ.Our particle filtering algorithm also admits a tidy, albeit noisy, method for estimating the likelihood function L(θ|y 1:T ) ∝ P (y 1:T |θ) [5].First note that: We can write each term in the product as: Note P θ (y t |X at time t provide an approximation to P θ (X t−L:t |y 1:t−1 ).Together this conveniently allows us to approximate this integral using: Taking logarithms gives us our estimator of the log-likelihood: As each term inside the outer sum is an approximation, the noise of this estimator grows with the length of data.In general this noise increases linearly with time, as does the time it takes to run a single filter, thus the computational requirements approximately scale with O(T 2 ) [6].

Particle Marginal Metropolis-Hastings
Particle marginal Metropolis-Hastings (PMMH; algorithm 2) is an established algorithm designed to estimate the joint posterior distribution P (X 1:T , θ|y 1:T ) of the hidden states and fixed parameter vector given our data, although in practice we use this method to estimate the marginal posterior distribution P (θ|y 1:T ).The algorithm uses the following proposal density: where X ′ 1:T are generated by running a particle filter at θ ′ .This gives an acceptance probability of: where q(θ ′ |θ) is our proposal density on our parameters and P (y 1:T |θ) is an estimate of the model evidence at parameter vector θ ′ -this is the exponential of l(θ|y 1:T ) described in section 1.2.3.The validity of using an estimate of the likelihood rather than an exact calculation is confirmed in [5], which is a key difference between PMMH and standard Metropolis-Hastings.

Algorithm 2 Particle Marginal Metropolis-Hastings
Require: Prior distribution P (θ) on θ, proposal density q(θ ′ |θ), and number of MCMC steps N Initialise θ 0 ∼ P (θ) and run algorithm 1 to estimate Pθ 0 = P (θ 0 |y 1:T ) Run algorithm 1 to estimate Pθ ′ = P (θ ′ |y 1:T ) Calculate acceptance probability a i = min 1, P (y Let θ i = θ ′ with probability a i , else let θ i = θ i−1 end for Return (CAR We use wide independent uniform distributions for our prior distributions on σ R , k c , and k w .A wide uniform prior distribution can also be placed on σ CAR , however, this results in a relatively high-valued posterior estimate for this parameter as the model can choose values of CAR t that closely fit the fluctuations in reported case data.We want our estimates of CAR t to reflect an underlying reporting rate, rather than the daily noise in reporting, so use a prior distribution on σ CAR to ensure this.For time periods encompassing 1 April 2022 -31 March 2023 we use a normal distribution with mean 0.006 and standard deviation 0.00204, truncated on (0, ∞), which has a 95 th quantile of 0.01.For the first time period, encompassing 1 January 2022 to 31 March 2022, we use a higher-mean normal distribution with mean 0.024 and standard deviation 0.00816, truncated on (0, ∞), which has a 95 th quantile of 0.04.The use of a higher-mean prior distribution for σ CAR in this first period allows the model to fit to the rapid change in CAR t that is thought to have occurred when RATs were rolled out in February 2022 [4].Table 1 reports our choices for prior distributions.
We use independent normal proposal densities for each parameter.The chosen standard deviations of the proposal densities are given in table 2. We outline how we chose these in section 1.2.5.
Supplementary Table 1: Prior distributions on parameters.All normal distributions (represented by N ) are truncated on (0, ∞).The continuous uniform distribution is represented by U .The period starting 1 March 2022 continues until the end of considered period on 31 March 2023.

Practical Considerations for Particle Marginal Metropolis-Hastings
In situations where the proposed particle is rejected, it is not necessary to re-estimate Pθ i−1 .One can typically let Pθ i = Pθ i−1 .In some situations, particularly when the variance of the likelihood estimator is large, the Markov chain can get stuck on values of θ ′ where the estimate of P θ ′ was unusually high, resulting in slower convergence.To avoid this we re-estimate P θ i−1 if n consecutive proposals have been rejected.Choosing n requires balancing the computational cost of running additional particle filters with the cost of slower mixing chains.We use n = 5 in this work.
Supplementary Theoretically, this algorithm works irrespective of the number of particles N x used in each filter, however, this does have a substantial impact on the performance of the algorithm.A general heuristic is that N x should be chosen such that the standard deviation of l(θ|y 1:T ) is around 1.2-1.3[6].This standard deviation is a function of θ itself (generally speaking estimates of ℓ at more likely values of θ have lower standard deviations) so choosing the ideal N x is not a simple task.We use N x = 10 5 particles per filter when fitting to three-month time periods.
PMMH is a computationally expensive algorithm.Our results rely on 8 PMMH chains for each of the five time periods considered.Whilst running this on high-performance computing services can allow us to utilise the 40 cores required to run each chain simultaneously, it can still take multiple days to generate sufficient samples.We find that, with N x = 10 5 , it takes us approximately 20 hours to generate 2000 samples (although this can be faster if more modern central processing units are used).As seen in section 2.5, this can be enough to meet certain convergence criteria even if this is fewer samples than most MCMC algorithms target.Practically we expect this to make little difference to our posterior estimates of θ, and even less of a difference to our posterior estimates of X 1:T .This final point can be seen by noting that the marginal posterior estimates of X 1:T are very similar to the posterior estimates of X 1:T conditional on any plausible value of θ -the majority of the uncertainty comes from the relationship between the hidden states and the observed data, rather than the hyper-parameters that characterise this relationship.
Despite generally using wide uniform prior distributions, we want to ensure our chains start at plausible values of θ, otherwise considerable computation time must be spent on a windin period.Therefore, we first computed approximate bivariate heatmaps of the estimated log-likelihood on a coarse parameter mesh (see Supplementary Material sec.2.2).We then initialised chains in the part of parameter space with relatively high likelihood values.As well as reducing convergence times, this technique also provides some reassurance that the PMMH algorithm is not missing any hidden modes, and that the posterior distribution found is similar to empirical results.
We ran the PMMH algorithm twice for each three-month block.First we did a training run, with seemingly plausible values for the parameter proposal variances, to provide a crude estimate of the posterior variance.Then a second, final, run was performed using the heuristic proposal variance of 2.38σ posterior /n dim , where σ posterior is the estimated posterior standard deviation of the chosen parameter from the first run, and n dim = 4 is the dimensionality of the parameter space.This heuristic could be replaced with adaptive MCMC methods -these are well known but slightly more complicated to implement from scratch.

Posterior Distribution on Hidden States
One way of estimating the marginal posterior distribution P (X 1:T |y 1:T ) is to store one (or more) trajectories from each PMMH step.The resulting set of particle trajectories are samples from the pointwise marginal posterior distribution P (X t |Y 1:T ).However, as we fit the parameters in three-month windows, the PMMH method only outputs trajectories in three-month blocks, which cannot be easily joined together.
Instead we first run algorithm 2 to generate a set of fixed parameter values {θ i } Nc i=1 ∼ P (θ|y 1:T ).We then sample from P (X 1:T |y 1:T ) by iteratively uniformly sampling θ * from {θ i } Nc i=1 , running algorithm 1 with N x particles at θ = θ * , and keeping N s trajectories (where N s ≤ N x ) from the output.Repeating this N c times (once for each parameter sample) gives a set of N s N c particle trajectories that approximate the pointwise posterior distribution P (X t |y 1:t+h ) ≈ P (X t |y 1:T ).Note that each sample from {θ i } Nc i=1 consists of five independent sets of values for the inferred parameters (σ CAR , σ R , k c and k w ), one for each of the five three-month periods.When we run algorithm (2) for the whole 15-month period, we assume that the values of these four parameters change instantaneously from one three-month block to the next.
Typically we want to run this with sufficient unique draws from P (θ|y 1:T ) (say N c ≥ 100) to appropriately account for uncertainty in θ.The number of samples retained from each iteration should be chosen so our overall number of trajectories is sufficiently large -we choose N c N s = 2 × 10 6 .Finally, as we are less concerned by minor degeneracy in individual particle filters, the number of particles used in each filter N x can be smaller than if we were just running the filter once.For our results, N x = 10 5 worked well, but this needs to be tailored to individual purposes.
When presenting results we calculate the mean of the samples as the central estimate and use the 2.5 th and 97.5 th quantiles to represent our 95% credible intervals (CrI).

Pre-Determined Parameters
In addition to the estimated parameters, there are others that we fix.These are the generation time distribution g u , infection-to-reporting distribution L u , the infection-to-shedding distribution ω u , and the average total genome copies per infection α.We also pre-specify the fixed-lag resampling window h = 30.
The generation time is assumed to be a discretised Gamma random variable with mean 3.3 days and standard deviation 1.3 days [4,[7][8][9].The infection-to-reporting and infection-toshedding distributions are calculated as convolutions of an incubation period (infection-to-onset) distribution (Weibull with mean 2.9 days and standard deviation 2.0 days [10]) and an onset-toreporting distribution (estimated from New Zealand data, mean 1.8 days and standard deviation 1.8 days) or an onset-to-shedding distribution (mean 0.7 and standard deviation 2.6 [1]).The Gamma and Weibull distributions were discretised by taking their value at integer times and normalising.All of these distributions are presented in Supplementary Figure S1.

Estimating Curvewise Extrema
Our primary methods produce pointwise estimates of the hidden states at each time step (e.g.mean and quantiles of the samples at each fixed value of t).As the timing of peaks and troughs can be quite variable between particles, using pointwise statistics to quantify the heights and timings of peaks or troughs (e.g.local maxima in the median value across particles) can be misleading [11].For this purpose, is important to consider these on a curvewise basis (e.g.median of the local maxima across particles).This requires samples from the joint posterior distribution P (X s:t |y 1:T ) over some fixed window s : t.
We achieve this by increasing our resampling length h and halting the algorithm when the period of interest is contained in (T − h, T − 30) where T represents the final stopping time.Limiting the lower window to t ≥ T − h ensures that complete trajectories are faithfully resampled over the time period of interest.Limiting the upper window to t ≤ T − 30 ensures we are using Supplementary Figure 2 For CAR, there were substantial differences between the three different sets of input data (Figure 2).When using only reported cases or wastewater data separately, the model did not have sufficient information to inform estimates of CAR.As a consequence, estimates were either inaccurate or did not capture the temporal trend and had very wide credible intervals.When reported case information was combined with wastewater data, there was good agreement between the estimated CAR and the true solution, with a relatively narrow credible interval.This illustrates the value of combining wastewater data with reported case information to obtain reliable estimates of changes in CAR over time.

Visualising Log-Likelihood Estimates
As discussed in section 1.2.5, we simulated the model likelihood on a coarse grid of parameter values to get a preliminary estimate of the plausible range of parameter values.We present examples of two outputs in figure 3. The left-plot shows log-likelihood estimates for σ R for the third estimation window (1 July 2022 to 30 September 2022) while the right-plot shows a bivariate heatmap of log-likelihood estimate for σ CAR and k c .Code to reproduce these figures is provided in the usefulscripts subfolder at https://github.com/nicsteyn2/NZWastewaterModelling.
Proposal VariancesThe parameters σ R , σ CAR , k c , and k w may change as the epidemiological landscape changes so we fitted them in five distinct time periods: (1) 1 January 2022 -31 March 2022, (2) 1 April 2022 -30 June 2022, (3) 1 July 2022 -30 September 2022, (4) 1 October 2022 -31 December 2022, and (5) 1 January 2023 -31 March 2023.Choosing the duration of these windows requires balancing changing epidemiological dynamics (we expect these parameters to somewhat change over time) with using more data to obtain more precise estimates.
: Synthetic results.(left) Instantaneous reproduction number, (centreleft) case ascertainment rate, (centre-right) wastewater data in genome copies per person per day (gc/p/d) and (right) reported cases.Results are shown when using (a) only reported cases as input data for the particle filter, (b) only wastewater data, and (c) both reported cases and wastewater data.Solid lines present central estimates.Shaded regions show 95% CrIs on the value of the hidden states (left and centre-left columns) and 95% CrIs on the prediction distribution for wastewater data and reported cases (centre-right and right columns).Black dashed lines indicate the synthetic data.Vertical red lines in hidden state plots (left and centre-left columns) indicate the end of the two wind-in periods (see Supplementary Materials sec.2.2).

Supplementary Figure 6 :
The effect of shifting the generation time distribution by one day forward or backward.The effect of shifting the generation time distribution by one day forward or backward on (a) instantaneous reproduction number, (b) case ascertainment rate, and (c) relative case ascertainment rate.

Table 2 :
The chosen standard deviation for each independent normal proposal distribution.